Commentary The Chinese Human Genome Diversity Project

نویسنده

  • L. Luca Cavalli-Sforza
چکیده

The Chinese population comprises one-fifth of the human species. The Chinese government officially recognizes 56 ethnic groups, one of which is the Han majority (1 billion and 100 million people), and the other 55 are ethnic minorities (totaling about 100 million). The latter are spread over most of China, but especially in the south. Close to half of the minorities are found in one of the 28 provinces of China, Yunnan. The distinction is primarily linguistic but corresponds closely to other cultural differences. The paper by Chu et al. published in this issue of the Proceedings (1) explores the genetic stratification of about half of the official ethnic subdivisions by means of microsatellites, a class of genetic markers recently discovered that has proved very useful for several purposes. The paper represents the collective effort of several institutes participating in the Chinese Human Genome Diversity Project (CHGDP). The broader Human Genome Diversity Project (HGDP) was generated in 1991 by the international Human Genome Organization (HUGO) and is regionally organized (see http:yywww.stanford.eduygroupy morrinstyHGDPyhtml). The CHGDP has started collecting cell lines from the official ethnic groups and testing their DNAs. The 56 official ethnic groups do not exhaust current Chinese diversity, as there are more than 100 languages spoken in China, but they include the most important ones. Microsatellites are repeats of short DNA segments, practically less than five nucleotides long. They have a high mutation rate and therefore a large number of alleles, which makes them perhaps three times more informative on average than the most common type of genetic polymorphisms, single nucleotide substitutions, which are mostly biallelic. They are used very widely in genetic linkage studies and have begun to be used in evolutionary analyses (e.g., refs. 2–4). Thirty microsatellites were tested by Chu et al. (1) for reconstructing a tree of 14 East Asian populations, which were studied along with 11 populations of a standard set representing the rest of the world. A subset of 15 of the same microsatellites were used to construct a second tree from 32 East Asian populations. These include the first 14 and are compared with the same 11 populations from the rest of the world. Bootstrap (5, 6) values (measures of reproducibility of the tree branchings, varying from 0 to 100) are high in both trees for the fewer populations outside East Asia, which are rather remote both geographically and genetically from each other. These comparisons present the greatest genetic divergence, and their analysis by tree is therefore more reproducible. Results agree closely with a previous comparable analysis (2). The comparisons among East Asian populations involve much smaller genetic differences and, as expected, bootstrap values are much smaller. Because of their closer geographic proximity they are also likely to have had a much greater reciprocal gene flow than the more distant populations from the rest of the world. Studying populations much closer geographically and genetically puts analysis by tree to a more severe test. Even so, all East Asian populations cluster together in both trees. Their nearest genetic neighbors from the rest of the world are, not surprisingly, Native Americans. A little less close genetically is the small cluster formed by Australian aborigines and New Guineans, in agreement with the fact that Australia was settled before the Americas and had more time to differentiate (7, 8). The first outlier within the East Asian cluster of the first tree is the Cambodian branch, and the second a small cluster made of two Altaic language-speaking populations (Buryat and Yakut). These populations live not too far from China, south and north of it, respectively. The other 11 East Asians form two fairly sharp clusters. One includes four Taiwan aborigines and two Chinese ethnic minorities from the western part of the Yunnan province. The other cluster includes Korean, Manchu, Japanese, and two groups of Han (one from Yunnan and the other from the United States). Usually, most Chinese immigrants to the U.S. (and to other countries, like Singapore, Malaysia, the Philippines, Taiwan, etc.) come from southern China, and this is certainly true of the cell lines from California residents from China born in the mainland, collected by Louise Chen and Alice Lin at Stanford and used in our surveys (2, 7, 8). Han living in the south of China mostly came originally from the north, but they did so at very different times, and thus had different times for gene flow from the earlier settlers, that is the minorities. In general, there is a correlation between the average genotype for protein polymorphisms of Hans from the different provinces and of local minorities, but there are exceptions (R. Du, H. Chungtze, E. Minch, and L.L.C.-S., unpublished work). The second tree is based on more populations but fewer microsatellites, and the bootstraps are inevitably worse in the East Asian part of the tree. Conclusions therefore must be taken with greater caution. The southern group of populations falls into three clusters. S1 contains all four Taiwan aborigines and five Yunnan ethnic minorities. S2 contains Cambodians and six ethnic minorities from various southern provinces other than Yunnan, and also Han from the province of Henan, a north-central province on the north-south boundary. S3 is the tightest cluster and is made up of only two minorities, both from western Yunnan. The northern group of populations falls into two clusters, N1 and N2. N1 is a classical northern cluster, with Japanese, Manchu, Korean, and Siberian. The Chinese are Han from the North—the northern Chinese by definition— and Han from the Yunnan, probably late immigrants who had no time to receive gene flow from the local people. There are also the Uyghur from the Xinjang province at the extreme west of China, who received a ca. 25% genetic contribution from ancestors of European origin, showing in their genes and, albeit qualitatively, in their phenotype and dresses (9). Their mummies, the oldest of which are from 3,800 years ago, show unquestionable evidence of European origins in their physical and cultural traits. They are probably descendants of people speaking Tocharian, an extinct Indo-European language. The residual 75% of their genotype must be from admixture with neighbors: 1% gene flow per generation (a very modest quantity) would be enough to cause the level of admixture observed (8).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I-49: Human Y Chromosome ProteomeProject

The success of the Human Genome Project (HGP) has provided a blueprint for the approximately 20,000 gene-encoded proteins potentially active in all of the hundreds of cell types that make up the human body. Yet we still have limited knowledge about a majority of the gene-encoded proteins which are the “building blocks of life” and “cellular machinery”. It is estimated that for nearly half of th...

متن کامل

Indigenous peoples and the morality of the Human Genome Diversity Project.

In addition to the aim of mapping and sequencing one human's genome, the Human Genome Project also intends to characterise the genetic diversity of the world's peoples. The Human Genome Diversity Project raises political, economic and ethical issues. These intersect clearly when the genomes under study are those of indigenous peoples who are already subject to serious economic, legal and/or soc...

متن کامل

Deep whole-genome sequencing of 90 Han Chinese genomes

Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to d...

متن کامل

Human Microbiome

Humans are almost identical in their genetic pattern, but the slight differences in our DNA lead to remarkable phenotypic variation among the human population. There are a variety of microbial communities and their genes (microbiomes) in the human body that play an essential role in human health and disease. The microbes inhabiting our bodies is quite a bit more variable, with only a third of i...

متن کامل

I-3: Human Y Chromosome Proteome Project 2012 Update

The Human Genome Project has generated a blueprint for the approximately 20,300 gene-encoded proteins potentially active in any of 230 cell types that make up the human body (human proteome). However, based on the UniProtKB/Swiss-Prot database content, about 6000 of at the protein level; for many others, there is very little information related to protein function, abundance, subcellular locali...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998